Bayesian Large-scale Multiple Regression with Summary Statistics from Genome-wide Association Studies.

نویسندگان

  • Xiang Zhu
  • Matthew Stephens
چکیده

Bayesian methods for large-scale multiple regression provide attractive approaches to the analysis of genome-wide association studies (GWAS). For example, they can estimate heritability of complex traits, allowing for both polygenic and sparse models; and by incorporating external genomic data into the priors, they can increase power and yield new biological insights. However, these methods require access to individual genotypes and phenotypes, which are often not easily available. Here we provide a framework for performing these analyses without individual-level data. Specifically, we introduce a "Regression with Summary Statistics" (RSS) likelihood, which relates the multiple regression coefficients to univariate regression results that are often easily available. The RSS likelihood requires estimates of correlations among covariates (SNPs), which also can be obtained from public databases. We perform Bayesian multiple regression analysis by combining the RSS likelihood with previously proposed prior distributions, sampling posteriors by Markov chain Monte Carlo. In a wide range of simulations RSS performs similarly to analyses using the individual data, both for estimating heritability and detecting associations. We apply RSS to a GWAS of human height that contains 253,288 individuals typed at 1.06 million SNPs, for which analyses of individual-level data are practically impossible. Estimates of heritability (52%) are consistent with, but more precise, than previous results using subsets of these data. We also identify many previously unreported loci that show evidence for association with height in our analyses. Software is available at https://github.com/stephenslab/rss.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Supplement to “bayesian Large-scale Multiple Regression with Summary Statistics from Genome-wide Association Studies”

A Theoretical derivation of RSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 A.1 Proofs of propositions in main text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 A.1.1 Proof of Proposition 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 A.1.2 Proof of Proposition 2.2 . . . . . . . . . . . . . . ....

متن کامل

Analysis of genome-wide association data by large-scale Bayesian logistic regression

Single-locus analysis is often used to analyze genome-wide association (GWA) data, but such analysis is subject to severe multiple comparisons adjustment. Multivariate logistic regression is proposed to fit a multi-locus model for case-control data. However, when the sample size is much smaller than the number of single-nucleotide polymorphisms (SNPs) or when correlation among SNPs is high, tra...

متن کامل

JAM: A Scalable Bayesian Framework for Joint Analysis of Marginal SNP Effects

Recently, large scale genome-wide association study (GWAS) meta-analyses have boosted the number of known signals for some traits into the tens and hundreds. Typically, however, variants are only analysed one-at-a-time. This complicates the ability of fine-mapping to identify a small set of SNPs for further functional follow-up. We describe a new and scalable algorithm, joint analysis of margin...

متن کامل

Bayesian Variable Selection Regression for Genome - Wide Association Studies , and Other Large - Scale Problems

We consider applying Bayesian Variable Selection Regression, or BVSR, to genome-wide association studies, and similar large-scale regression problems. Currently, typical genome-wide association studies measure hundreds of thousands, or millions, of genetic variants (SNPs), in thousands or tens of thousands of individuals, and attempt to identify regions harboring SNPs that affect some phenotype...

متن کامل

A pathway analysis method for genome-wide association studies.

For genome-wide association studies, we propose a new method for identifying significant biological pathways. In this approach, we aggregate data across single-nucleotide polymorphisms to obtain summary measures at the gene level. We then use a hierarchical Bayesian model, which takes the gene-level summary measures as data, in order to evaluate the relevance of each pathway to an outcome of in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • The annals of applied statistics

دوره 11 3  شماره 

صفحات  -

تاریخ انتشار 2017